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Abstract — We present a method to generate a robot control 
strategy that maximizes the probability to accomplish a task. 
The task is given as a Linear Temporal Logic (LTL) formula 
over a set of properties that can be satisfied at the regions of a 
partitioned environment. We assume that the probabilities with 
which the properties are satisfied at the regions are known, and 
the robot can determine the truth value of a proposition only at 
the current region. Motivated by several results on partitioned- 
based abstractions, we assume that the motion is performed on 
a graph. To account for noisy sensors and actuators, we assume 
that a control action enables several transitions with known 
probabilities. We show that this problem can be reduced to the 
problem of generating a control policy for a Markov Decision 
Process (MDP) such that the probability of satisfying an LTL 
formula over its states is maximized. We provide a complete 
solution for the latter problem that builds on existing results 
from probabilistic model checking. We include an illustrative 
case study. 

I. Introduction 

Recently there has been an increased interest in using 
temporal logics, such as Linear Temporal Logic (LTL) and 
Computation Tree Logic (CTL) as motion specification lan- 
guages for robotics [l]-[6]. Temporal logics are appealing 
because they provide formal, high level languages in which 
to describe complex missions, e.g., "Reach A, then B, and 
then C, in this order, infinitely often. Never go to A. Don't 
go to B unless C or D were visited." In addition, off-the- 
shelf model checking algorithms [7], [8] and temporal logic 
game strategies [9] can be used to verify the correctness of 
robot trajectories and to synthesize robot control strategies. 

Motivated by several results on finite abstractions of con- 
trol systems, in this paper we assume that the motion of the 
robot in the environment is modeled as a finite labeled tran- 
sition system. This can be obtained by simply partitioning 
the environment and labeling the edges of the corresponding 
quotient graph according to the motion capabilities of the 
robot among the regions. Alternatively, the partition can be 
made in the state space of the robot dynamics, and the 
transition system is then a finite abstraction of a continuous 
or hybrid control system [10], [11]. 

The problem of controlling a finite transition system from 
a temporal logic specification has received a lot of attention 
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during recent years. All the existing works assume that the 
current state can be precisely determined. If the result of a 
control action is deterministic (i.e., at each state, an available 
control enables exactly one transition), control strategies 
from specifications given as LTL formulas can be found 
through a simple adaptation of off-the-shelf model checking 
algorithms [3]. If the control is nondeterministic (an available 
control at a state enables one of several transitions, and 
their probabilities are not known), the control problem from 
an LTL specification can be mapped to the solution of a 
Rabin game [12], or simpler Biichi and GR(1) games if the 
specification is restricted to fragments of LTL [1]. If the 
control is probabilistic (an available control at a state enables 
one of several transitions, and their probabilities are known), 
the transition system is a Markov Decision Process (MDP). 
The control problem then reduces to generating a policy 
(adversary) for an MDP such that the produced language 
satisfies a formula of a probabilistic temporal logic [13], 
[14]. We have recently developed a framework for deriving 
an MDP control strategy from a formula in a fragment of 
probabilistic CTL (pCTL) [15]. For probabilistic LTL, in 
[16], a control strategy is synthesized for an MDP where 
some states are under control of the environment, so that 
an LTL specification is guaranteed to be satisfied under all 
possible environment behaviors. The temporal logic control 
problems for systems with probabilistic or nondeterministic 
state-observation models, which include the class of Partially 
Observable Markov Decision Processes [17], [18], are cur- 
rently open. 

In this paper, we consider motion specifications given as 
arbitrary LTL formulas over a set of properties that can 
be satisfied with given probabilities at the vertices of a 
graph environment. We assume that the truth values of the 
properties can be observed only when a vertex is reached in 
the environment, and the observations of these properties are 
independent with each other. We assume a probabilistic robot 
control model and that the robot can determine its current 
vertex precisely. Under these assumptions, we develop an 
algorithm to generate a control strategy that maximizes the 
probability of satisfying the specification. Our approach is 
based on mapping this problem to the problem of generating 
a control policy for a MDP such that the probability of satis- 
fying an LTL formula is maximized. We provide a solution 
to this problem by drawing inspiration from probabilistic 
model checking. We illustrate the method by applying it 
to a numerical example of a robot navigating in an indoor 
environment. 

The contribution of this work is twofold. First, we adapt 



existing approaches in probabilistic model checking {e.g., 
[19], [20]), and provide a complete solution to the general 
problem of controlling MDPs from full LTL specifications 
using deterministic Rabin automata. This is a significant de- 
parture from our previous work on MDP control from pCTL 
formulas [15], since it allows for strictly richer specifications. 
The increase in expressivity is particularly important in many 
robotic applications where the robot is expected to perform 
some tasks, such as surveillance, repeatedly. However, it 
comes at the price of increased computational complexity. 
Second, we allow for non-determinism not only in the robot 
motion, but also in the robot's observation of properties in 
the environment. This allows us to model a large class of 
robotic problems in which the satisfaction of properties of 
interest can be predicted only probabilistically. For example, 
we can model a task where a robot is operating in an indoor 
environment, and is required to pick-up and deliver items 
among some rooms. The robot determines its current location 
using RFID tags on the floors and walls. Non-determinism 
occurs in observations because items may or may not be 
available when a robot visits a room. Non-determinism also 
occurs in the motion due to imprecise localization or control 
actuation. 

The remainder of the paper is organized as follows: 
In Section [TT] we introduce the necessary definitions and 
preliminary results. In Section III we formulate the problem 
and describe the technical approach. In Section IV we 



reformulate this problem onto a MDP and show that two 
problems are equivalent. We synthesis our controls strategy 
in Section [VJ and an example of the provided algorithm is 
shown in Section [VI] We conclude in Section IVHI 

II. Preliminaries 

In this section we provide background material on linear 
temporal logic and Markov decision processes. 

A. Linear Temporal Logic 

We employ Linear Temporal Logic (LTL) to describe high 
level motion specifications. A detailed description of the 
syntax and semantics of LTL is beyond the scope of this 
paper and can be found in, for example, [7]. Roughly, an 
LTL formula is built up from a set of atomic propositions 
IT, which are properties that can be either true or false, 
standard Boolean operators -i (negation), V (disjunction), A 
(conjunction), and temporal operators O (next), U (until), O 
(eventually), □ (always) and (implication). The semantics 
of LTL formulas are given over words, which is defined as 
an infinite sequence o = o^oi . . ., where oi E 2 n for all i. 

We say o 1= <j> if the word o satisfies the LTL formula <fi. 
The semantics of LTL is defined recursively. If <f> = tt is 
an LTL formula, where ir E II, then <f> is true at position 
i of the word if tt 6 Oj. A word satisfies an LTL formula 
</> if <f> is true at the first position of the word; 0<fi means 
that 4> is true at all positions of the word; Ocf) means that 
cj) eventually becomes true in the word; <\>\ U<\>2 means cf>2 
eventually becomes true and <\>\ is true until this happens; 
Qxt> means that cf) becomes true at next position of the word. 
More expressivity can be achieved by combining the above 



temporal and Boolean operators (several examples are given 
later in the paper). An LTL formula can be represented by a 
deterministic Rabin automaton, which is defined as follows. 

Definition 2.1 (Deterministic Rabin Automaton): 
A deterministic Rabin automaton (DRA) is a tuple 
1Z = (Q,^,5,qo,F), where (i) Q is a finite set of states; 
(ii) E is a set of inputs (alphabet); (iii) 5 : Q x E — » Q is 
the transition function; (iv) q$ G Q is the initial state; and 
(v) F — {(Li,Ki), . . . , (Xfc, -Kfc)} is a set of pairs where 
Li,Ki C Q for all i G {1, . . . , k}. 

A run of a Rabin automaton 1Z, denoted by r-jz = qaqi . . ., 
is an infinite sequence of states in 1Z such that for each i > 0, 
qi+i E S(qi,a) for some a E E. A run rn is accepting if 
there exists a pair (L,K) E F such that 1) there exists 
n > 0, such that for all m > n, we have q m ^ L, and 2) 
there exist infinitely many indices k where q% E K. This 
acceptance conditions means that r-ji is accepting if for a 
pair (L,K) E F, r-ji intersects with L finitely many times 
and K infinitely many times. 

For any LTL formula cf) over II, one can construct a DRA 
with input alphabet E = 2 n accepting all and only words 
over II that satisfy cf> (see [21]). We refer readers to [22] 
and references therein for algorithms and to freely available 
implementations, such as [23], to translate a LTL formula 
over II to a corresponding DRA. 

B. Markov Decision Process and probability measure 

We now introduce a labeled Markov decision process, and 
the probability measure we will use in the upcoming sections. 

Definition 2.2 (Labeled Markov Decision Process): A la- 
beled Markov decision process (MDP) is a tuple M. = 
(S,U, A^V , i, II, h), where (i) S is a finite set of states; (ii) 
U is a finite set of actions; (iii) A : S — > 2 U represents the set 
of actions enabled at state s E S; (iv) P:5xWx5-> [0,1] 
is the transition probability function such that for all states 
s e 5, E S 'es7 , ( s >'"; s ) = 1 if « G A{s) C U and 
P(s, u, s') = if u i A(s); (v) i : S -> [0, 1] is the initial 
state distribution satisfying 2~2ses L ( s ) = 1> ( v *) n is a set 
of atomic propositions; and (vii) h : S — > 2 n is a labeling 
function. 

The quantity P(s,u,s') represents the probability of 
reaching the state s' from s taking the control u E A(s). 

We will now define a probability measure over paths in the 
MDP. To do this, we define an action function as a function 
p : S -^U such that p(s) E A(s) for all s E S. An infinite 
sequence of action functions M = {/io, pi, . . .} is called a 
policy. One can use a policy to resolve all nondeterministic 
choices in an MDP by applying the action pk{sk) at each 
time-step k. Given an initial state sq such that l(sq) > 0, 
an infinite sequence rj^ = soSi . . . on A4 generated under 
a policy M = {po,pi,...} is called a path on A4 if 
■p(sj, Pi(si), s i+ i) > for all i. The subsequence s Si ■ ■ ■ s n 
is called a finite path. If pi = p for all i, then we call this 
policy a stationary policy. 

We define Paths^ and FPaths^ as the set of all infinite 
and finite paths of A4 under a policy M starting from 
any state so where l(sq) > 0. We can then define a 
probability measure over the set Paths^ of paths. For a 



path rJ^J = sqSi . . . s n s n+ i . . . G Paths the prefix of 
length n of r¥^ is the finite subsequence sqSi . . . s n . Let 
Paths^J,(soSi . . . s n ) denote the set of all paths in Paths^ 
with the prefix s s i • ■ • s n- (Note that s^si ... s„ is a finite 
path in FPaths^.) 

Then, the probability measure Pr M on the smallest a- 
algebra over Paths^ containing Paths^ [sqS\ . . . s n ) for 
all SoSi . . . s„ € FPaths^, is the unique measure satisfying 

Pr M {Paths^( So si...s„)} 

0<i<n 

Finally, we can define the probability that a policy M in an 
MDP M satisfies an LTL formula cj>. A path = s si . . . 
deterministically generates a word o = oqO\ . . . where Oi — 
h(si) for all i. With a slight abuse of notation, we denote 
h(rj^) as the word generated by rj^j. Given an LTL formula 
(f>, one can show that the set {rj^ e Paths^, : h(r^) \= tfi} 
is measurable. We define 

P# (0) := Pr M {r^ e Paths^, : h{r%) N <t>} (2) 

as the probability of satisfying <j> for M. under policy M. 
For more details about probability measures on MDPs under 
a policy and measurability of LTL formulas, we refer the 
reader to a text in probabilistic model checking, such as [19]. 

III. Model, Problem Formulation, and Approach 

In this section we formalize the environment model, the 
robot motion model, and the robot observation model. We 
then formally state our problem and provide a summary of 
our technical approach. 

A. Environment, task, and robot model 

1 ) Environment model: In this paper, we consider a robot 
moving in a partitioned environment, which can be repre- 
sented by a graph and a set of properties: 

£ = (V,S £ ,U), (3) 

where V is the set of vertices, 6s C V x V is the relation 
modeling the set of edges, and IT is the set of properties 
(or atomic propositions). Such a finite representation of 
the environment can be obtained by using popular partition 
schemes, such as triangulations or rectangular grids. The set 
V can be considered as a set of labels for the regions in 
the partitioned environment, and 5g is the corresponding 
adjacency relation. In this paper we assume that there is 
no blocking vertex in V (i.e., all vertices have at least one 
outgoing edge). 

2) Task specification: The atomic propositions II repre- 
sent properties in the environment that can be true of false. 
We require the motion of the robot in the environment to 
satisfy a rich specification given as an LTL formula over 
II (see Sec. pj. A variety of robotic tasks can be easily 
translated to LTL formulas. For example, 

• Parking: "Find parking lot and then park" 
((Oparking lot) A (parking lot =>■ Qp ar k)) 



• Data Collection: 'Always gather data at gathering loca- 
tions and then upload the data, repeat infinitely many 
times" (□0(gather Oupload)) 

• Ensure Safety: 'Achieve task ?/> while always avoiding 
states satisfying Pi or P 2 " ( a ^(Pi V P 2 ) A ip). 

3) Robot motion model: The motion capability of the 
robot in the environment is represented by a set of motion 
primitives U, and a function A : V — > 2 U that returns the set 
of motion primitives available (or enabled) at a vertex v e V. 
For example, U can be {Turn Left, Turn Right, Go Straight} 
in an urban environment with roads and intersections. To 
model non-determinism due to possible actuation or mea- 
surement errors, we define the transition probability function 
P m :VxUxV—t[0,i\ such that J2 v > e v P ™{v, u, «') = 1 
for all v £ V and u E A(v), and P m (v,u,v') = if 
(v,v') ^ 6s or if u ^ A(v). Thus, P m (v,u,v r ) is the 
probability that after applying the motion primitive u at 
vertex v, the robot moves from v to an adjacent region v' 
without passing through other regions. The set U corresponds 
to a set of feedback controllers for the robot. Such feedback 
controllers can be constructed from facet reachability (see 
[24], [25]), and the transition probabilities can be obtained 
from experiments (see [15]). Note that this model of motion 
uses an underlying assumption that transition probabilities of 
the robot controllers do not depend on the previous history 
of the robot. 

4) Robot observation model: In our earlier work [3], we 
assumed that the motion of the robot in the partitioned 
environment is deterministic, and we proposed an automatic 
framework to produce a provably correct control strategy so 
that the trajectory of the robot satisfies an LTL formula. In 
[15], we relaxed this restriction and allowed non-determinism 
in the motion of the robot, and a control strategy for the robot 
was obtained to maximize the probability of satisfying a task 
specified by a fragment of CTL. In both of these results, it 
was assumed that some propositions in II are associated with 
each region in the environment (i.e., for each v € V), and 
they are fixed in time. 

However, this assumption is restrictive and often not true 
in practice. For example, the robot might move to a road 
and find it congested; while finding parking spots, some 
parking spots may already be taken; or while attempting to 
upload data at an upload station, the upload station might be 
occupied. We wish to design control strategies that react to 
information which is observed in real-time, e.g., if a road is 
blocked, then pick another route. 

Motivated by these scenarios, in this paper we consider 
the problem setting where observations of the properties of 
the environment are probabilistic. To this end, we define a 
probability function P : V x II — > [0, 1]. Thus, P (v,tt) is 
the probability that the atomic proposition it 6 IT is observed 
at a vertex v € V when v is visited. We assume that all 
observations of atomic propositions for a vertex v £ V are 
independent and identically distributed. This is a reasonable 
model in situations where the time-scale of robot travel is 
larger than the time scale on which the proposition changes. 
For future work, we are pursuing more general observation 



models. Let IL V := {tt e II : P a (v, tt) > 0} be the atomic 
propositions that can be observed at a vertex v. Then Z v = 
{Z G 2 n " : J] ^o(w,7r) X n (1 - ^oM) > 0} is the 

ttSZ n£Z 

set of all possible observations at v. 
B. Problem Formulation 

Let the initial state of the robot be given as Vq. The 
trajectory of the robot in the environment is an infinite 
sequence r = vqVi,..., where P m (vi, u, i>i+i) > for 
some u for all i. Given r = VqVi,..., we call Vi the 
state of the robot at the discrete time-step i. We denote the 
observed atomic propositions at time-step i as a* G 2^ and 
0(r) = oqOi . . . as the word observed by r. An example of 
a trajectory r and its observed word in an environment with 
given £, U, A, P m and P Q are shown in Fig. [T] 



Vo{vo,a) = 0.2 
o (v ,b) = 0.6 



Po(vi,6) = 1 



Po(u2,a) : 
P„(f 2 ,6) = 




V (v 3 ,a) = \ 



0.4 
1 

Z„ o ={0,{a},{6},{a,6}} 

Zv, = {{&}} 
Z V2 ={{b},{a,b}} 

Z n = {{«}} 
r = V0V-IV2V3V!... 

Control = Uiiiiii2 u 2-.. 
0(r) = {b}{b}{a,b}{a}{b}... 



Fig. 1. An example trajectory r and its observed word O(r). We also 
show Z v for all v £ V. A single arrow pointed towards a state no 
indicates the initial state. The atomic proposition set is II = {a,b}. The 
set of motion primitives is U = {ui, Uq., 113}. The probability function P a 
assigns probabilities for all atomic propositions at each state. We show the 
probability of an atomic proposition only if it is positive (i.e., it £ IT„). The 
number on top of an arrow pointing from a vertex v to v' is the probability 
Pm(v,u,v') associated with a control u £ U. 

Our desired "reactive" control strategy is in the form of an 
infinite sequence C = { vq , v\ , . . . } where 14 : V x 2 n — > U 
and Vi(v,Z) is defined only if Z G Z v . Furthermore, we 
enforce that fi(v, Z) £ A(v) for all v and all i. The reactive 
control strategy returns the control to be applied at each 
time-step, given the current state v and observed set of 
propositions Z at v. Given an initial condition vq and a 
control strategy C, we can produce a trajectory r = vqVi . . . 
where the control applied at time i is Vi(vi, Oj). We call r and 
0(r) = o = oqOi . . . the trajectory and the word generated 
under C, respectively. Note that given vq and a control 
strategy C, the resultant trajectory and its corresponding 
word are not unique due to non-determinism in both motion 
and observation of the robot. 

Now we formulate the following problem: 

Problem 3.1: Given the environment represented by £ = 
(V,d£,H); the robot motion model U, A and P m ; the 
observation model P ; and an LTL formula <fi over IT, find 
the control strategy C that maximizes the probability that the 
word generated under C satisfies <f>. 

C. Summary of technical approach 

Our approach to solve Prob. |3.1| p roceeds by construction 



each control strategy C corresponds uniquely to a policy M 
on Ai. Thus, each trajectory with an observed word under a 
control strategy C corresponds uniquely to a path on A4 



under M. We then reformulate Prob. 3.1 as the problem 
of finding the policy on M. that maximizes the probability 
of satisfying 0. These two problems are equivalent due to 
the assumption that all observations are independent. We 
synthesize the optimal control strategy by solving maximal 
reachability probability problems inspired by results in prob- 
abilistic model checking. Our framework is more general 
than in [15] due to a richer specification language and non- 
determinism in observation of the environment. The trade 
off is that computational complexity in this approach is in 
general much larger due to increased size of the automaton 
representing the specification. 

IV. MDP Construction and Problem 
Reformulation 

As part of our approach to solve Problem |3.1| we con- 
struct a labeled MDP M = (S, U, A, V, 1, IT, h) from the 
environment model £, the robot motion model U, A, P m , 
and the observation model P as follows: 

. S = {{v,z)\v ev,z e z v } 

. u = u 

. A((v,Z))=A(v) 

. V((v,Z),u,(v',Z')) = 

P m (v,u,v')x JJp o (v',ic)x JJ (1 - P (v',ir)) J 

\tt£Z' tt<£Z' I 

• l is defined as t(s) = II ^(^o, tt) x II (1 — ^(^o, "")) 

if s = (vq, Z) for any Z G Z Vo , and t(s) = otherwise. 
. h((v, Z)) = Z for all (v, Z) G S. 
An example of a constructed MDP is shown in Fig. [2] One 
can easily verify that A4 is a valid MDP such that for all 

if u ^ A(s), and J2 s es t ( s ) = ^ ^ e discuss the growth of 



the state space from £ to M. in Section V-C 



We now formulate a problem on the MDP A4. We will 



then show that this new problem is equivalent to Prob. 3.1 



Problem 4.1: For a given labeled MDP A4 and an LTL 
formula cj>, find a policy such that Prj^(0) (see Eq. ^) is 
maximized. 

The following proposition formalizes the equivalence be- 
tween the two problems, and the one-to-one correspondence 
between a control strategy on £ and a policy on A4. 

Proposition 4.2 (Equivalence of problems): A control 
strategy C = {z/q, Vi, ■ ■ •} is a solution to Problem 3.1 if 
and only if the policy M = {/j,q, fi\, . . .}, where 

Pi ((vi,Zi)) = Ui(vi, Zi) for each i, 



of a labeled MDP M (see Def. 2.2 1, which captures all pos- 
sible words that can be observed by the robot. Furthermore, 



is a solution to Problem |4.1 1 

Proof: We can establish an one-to-one correspondence 
between a control strategy C in the environment £ and a 
policy M on A4. Given C — {vq, v%, . . .}, we can obtain the 
corresponding M = {p ,pi, . . .} by setting p l {(v i , Zi)) = 
fi(vi,Zi). Conversly, given M = {p , pi, . . .}, we can 




Fig. 2. The constructed MDP M using £, U, A, P m and P from the 
example in Fig. ^ For each state s £ S, the labels on top of the state 
show the components of s (i.e., s = (v, Z}). The number on the arrow 
from the state (v, Z) to the state (v', Z') denotes the transition probability 
'P{(v, Z), it, (v', Z')) for the action u a hi. The numbers atop arrows 
pointing into states (vo,Z VQ ) denote the initial distribution. The set of 
atomic propositions assigned to each state in Ai is the second component 
of the state. 



generate a corresponding control strategy C = {vq, v\, . . .} 
such that Vi{vi, Zi) = /^((u,, Z l )). 

We need only to verify that we can use the same probabil- 
ity measure on paths in £ and on trajectories in M.. Due to 
the assumption that each observation at v is independent, 
the observation process is Markovian as it only depends 
on which vertex the observation is made. Note that the 
probability of observing Z E Z v at a state v € V is 
Uirez P(«> *0 x U^zi 1 - Viv, tt)). Hence, the probability 
of moving from a vertex v to v', under control u, and 
observing Z' E Z v > is exactly V((v, Z), u(v' , Z')). There- 
fore, the probability of observing the finite word 0(r') 



for a finite trajectory 



v . 



. v n under C is the same 



(by construction of M) as traversing through a finite path 



/rjft E FPaths^, such that h(fr^) = 0(r f ) under the 
policy M corresponding to C. Since this property holds for 
any arbitrary finite trajectory r\ a trajectory r with a word 
0(r) under C can be uniquely mapped to a path in Paths^ 
and we can use the probability measure and er-algebra (see 
Sec. II-B I on M. under a policy M for the corresponding 
control strategy C. Thus, if M is a solution for Prob. |3.1| 



then the control strategy C corresponding to M is a solution 
for Prob. 14.11 and vice versa. ■ 



Due to the above proposition, we will proceed by construct- 



ing a policy M on the MDP M as a solution to Prob. 4.1 



We can then uniquely map M to a control strategy C in the 
robot environment £ for a solution to Prob. 13.11 



V. Synthesis of Control Strategy 

In this section we provides a solution for Prob. |3.1| by 
synthesizing an optimal policy for Prob. |4.1| Our approach 
is adapted from automata-theoretic approaches in the area 
of probabilistic verification and model checking (see [20] 
and references therein for an overview). Probabilistic LTL 



model checking finds the maximum probability that a path 
of a given MDP satisfies a LTL specification. We modify 
this method to obtain an optimal policy that achieves the 
maximum probability. This approach is related to the work of 
[26], in which rewards are assigned to specifications and non- 
deterministic Biichi automata (NBA) are used. We do not 
use NBAs since a desired product MDP cannot be directly 
constructed from an NBA, but only from an DRA. 

A. The Product MDP 

We start by converting the LTL formula to a DRA 
defined in Def. 2.1 We denote the resulting DRA as IZ^ = 
(Q,2 n ,<U ,F) withF = {(L 1 ,K 1 ),...,(L k ,K k )} where 
Li,Ki C Q for all i = 1, . . . , k. The DRA obtained from 
the LTL formula <f> = OOa A OOb is shown in Fig. [3] 



«,{6}} 



{0,{a}} 




K = {93,94} 



{{&},{<*, 6}} 



Fig. 3. The DRA TIj, corresponding to the LTL formula <j> = DOaA dOb. 
In this example, there is one set of accepting states F = {{L, K)} where 
L = and K = {93, 94}. Thus, accepting runs of this DRA must visit q% 
or g4 (or both) infinitely often. 



We now obtain an MDP as the product of a labeled MDP 
M and a DRA 71$. This product MDP allows one to find 
runs on A4 that generate words satisfying the acceptance 
condition of 1Z$. 

Definition 5.1 (Product MDP): The product MDP M x 
11$ between a labeled MDP M = (S,U,A,V,L,IL,h) 
and a DRA = (Q, 2 n , 5, q , F) is a MDP M r = 
(Sp,U,A-p,Vp,L'p), where: 

• S-p = S x Q (the Cartesian product of sets S and Q) 
. A v ((s,q))=A(s) 

. V v ((s,q),u, (s',q r )) = 

V(s,u,s') if q' = 5(q, h(s')) 
otherwise 

• L-p((s,qj) — l(s) if q = S(q ,h(s)) and u-p = 
otherwise. 

We generate the accepting state pairs F-p for the product 
MDP M-p as follows: For a pair [L^Ki) E F, a state (s, q) 
of Mp is in Lf if q € L l7 and (s, q) E Kf if q E K l . 

As an example, we show in Fig. |4] some of the states and 
transitions for the product MDP Mp = M x TZ$ where M. 
is shown in Fig. [2] and TZ$ is shown in Fig. [3] 

Note that the set of actions for M.p> is the same as the 
one for M.. A policy Mp = , /uf , . . .} on M.p> directly 
induces a policy M — {/^o, fii, ■ ■ •} on M. by keeping track 
of the state on the product MDP (p? is an action function 
that returns an action corresponding to a state in A4p). Note 
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Fig. 4. The product MDP Mv = M X where .M is shown in Fig. [2] 
and TZ.^, is shown in Fig. [3] Due to space restrictions, only the initial states 
(states for which the initial distribution t-p((s,g)) is positive) and part of 
the state space are shown. F P = {{L v ,K V )}, where L v = and states 
in are marked. 



that given the state of Ai at time-step i and the state of Ai-p 
at time-step i — 1, the state of Ai-p at time-step i can be 
exactly determined. We can induce a policy M for Ai from 
a policy Mp for .Mp as follows: 

Definition 5.2 (Inducing a policy for Ai from Ai-p): If 
the state of Ai-p at time-step i is (sj,c/j), then the policy 
A/ = {/i , jUi, . . .} induced from Mp = {/xf , /if . . .} can 
be obtained by setting /Xj(sj) = /if for all i. 

We denote r J 



- A ^ p as a path on Aip under a policy Mp. 
We say a path is accepting if and only if it satisfies the 
Rabin acceptance condition with Fp as the accepting states 
pairs, i.e., there exists a pair (L v ,K V ) £ Fp> so that rffi 
intersects with L v finitely many times and K v infinitely 
many times. 

The product MDP is constructed so that given a path 



( s o, </o)( s i, Qi) ■ ■ the path s s i ■ • • on A4 generates 
a word that satisfies <fr if and only if the infinite sequence 
<7ogi . . . is an accepting run on TZ^, in which case is 
accepting. Therefore, each accepting path of Ai-p uniquely 
corresponds to a paths of Ai whose word satisfies <f>. 

B. Generating the Optimal Control Strategy 

Once we obtain the product MDP Aip = Aix TZ^ and the 
accepting states pairs Fp = {(£f , Kf ),..., (LV, Kf )}, 
the method to obtain a solution to Prob. |3.1| proceeds as 
follows: For each pair (Lf , Kf) g Fp, we obtain a set of ac- 
cepting maximum end components. An accepting maximum 
end component for Aip consists of a set of states S-p C Sp 
and a function A-p such that ^ A-p((s,q)) C A-p((s,q)) 
for all (s, q) G Sp. It has the property that, by taking actions 
enabled by Ap, all states in Sp can reach every other state in 
Sp and can not reach any state outside of Sp. Furthermore, 
it contains no state in Lf and at least one state in Kf . In 
addition, it is called maximum because it is not properly 
contained in another accepting maximum end component. 
Note that for a given pair (if , Kf), its accepting maximal 
end components are pairwise disjoint. 

A procedure to obtain all accepting maximum end com- 
ponents of an MDP is outlined in [19]. From probabilistic 



model checking (see [19], [20]), the maximum probability 
of satisfying the LTL formula cf> for Ai is the same as the 
maximum probability of reaching any accepting maximum 
end component of Aip. Once an accepting maximum end 
component (6>p,.4p) is reached, then all states in S-p are 
reached infinitely often (and (j> satisfied) with probability 1, 
under a policy that all actions in A-p are used infinitely often. 

The maximum probability of reaching a set of states 
Bp C S-p can be obtained by the solution of a linear 
program. First we find the set of states that can not reach 
B-p under any policy and denote it as Cp (a simple graph 
analysis is sufficient to find this set). We then let x p denote 
the maximum probability of reaching the set Bp from a 
state p G S-p. We have x p = 1 if p € B-p, and x p = if 
p £ Cp. The values of x p for the remaining states can then 
be determined by solving a linear optimization problem: 

min x p , subject to: < x p < 1, and 

pes-p 

x p > ^2 V v(v, u, t)x t for all p£ S v \ (Bp U Cp) 

teS-p 

and for all u e Av(p)- (4) 

Once x p is obtained for all p € S-p, one can identify 
an action u* (not necessarily unique) for each state p S 
S-p \ (B v U Cp) such that: 



tes v 



,t)x t . 



(5) 



We define a function frp : Sp — > U that returns an action u* 
satisfying (|3j if p G Sp \ (B-p U Cp) (the actions for states in 
B-p or Cp are irrelevant and can be chosen arbitrarily). Then 
the optimal policy maximizing the probability of reaching 
B-p is the stationary policy {/ip, /tp, . . .}. 

Our desired policy Mp that maximizes the probability of 
satisfying (f> on the product MDP is the policy maximizing 
the probability of reaching the union of all accepting maxi- 
mum end components of all accepting state pairs in Fp, if 
the state is not in an accepting maximum end component. 
Otherwise, the optimal policy is to use all actions allowed in 
the associated accepting maximal end component infinitely 
often in a round-robin fashion. The solution to Prob. 14.11 is 



then the policy M* on Ai induced by MX. The desired 



reactive control strategy C* as a solution to Prob. 3.1 



finally be obtained as the control strategy corresponding to 



M* (see Prop. 4.2 1. Our overall approach is summarized in 
Alg.Q 

C. Complexity 

The complexity of our proposed algorithm is dictated by 
the size of the generated MDPs. We use |-| to denote cardinal- 
ity of a set. The number of states in Ai is \S\ = 2~2 v ev \^v\- 
Hence, in the worst case where all propositions it G II can 
be observed with positive but less than 1 probability at all 
vertices v G V, \S\ = 2' n '. In practice, the number of 
propositions that can be non-deterministically observed at 
a vertex is small. For example, in an urban setting, most 
regions of the environment including intersections and roads 



Algorithm 1 Generating the optimal control strategy C* 
given £, U, A, P m , P a and <f> 
1: Generate the MDP M. from the environment model £, 
the motion primitives U, the actions A, the motion model 
P m and the observation model P a . 
2: Translate the LTL formula to a deterministic Rabin 

automaton TZ^. 
3: Generate the product MDP Mv = M x TZ^ and ac- 
cepting states pairs F v = {(if , ), . . . , (L^,Kj!)}. 

4: Find all accepting maximum end components for all 
pairs (Lf , Kf) € F-p, and find their union Bp. 

5: Find the stationary policy fitp, . . .} maximizing the 
probability of reaching Bp by solving (|4| and Q. 

6: Generate the policy Mi = as follows: 

fif(p) — fJ?p(p) if p € <5p \ -Bp. Otherwise, p is in at 
least one accepting maximum end component. Assuming 
it is (S-p,A-p) and Ap(p) = {ui,u 2 , ■ ■ ■ ,u m }, then 
fif (p) — Uj where j = i mod m. 

1: Generate the policy M* — {fio, fit, . . .} induced by M£. 

8: Generate the control strategy C* = {vg,vt, ■ ■ .} corre- 
sponding to M* by setting Ui(v,Z) = fii((v,Z)) for 
all i. 



have fixed atomic propositions. The number of intersections 
that can be blocked is small comparing to the size of the 
environment. 

The size of the DRA \Q\ is in worst case, doubly expo- 
nential with respect to |n|. However, empirical studies such 
as [22] have shown that in practice, the sizes of the DRAs 
for many LTL formulas are exponential or lower with respect 
to |II|. In robot control applications, since properties in the 
environment are typically assigned scarcely (meaning that 
each region of the environment is usually assigned a small 
number of properties comparing to |n|), the size of DRA 
can be reduced much further by removing transitions in the 
DRA with inputs that can never appear in the environment, 
and then minimizing the DRA by removing states that can 
not be reached from the initial state. 

The size of the product MDP M v is \M\ x \Q\. The 
complexity for the algorithm to generate accepting maximal 
end component is at most quadratic in the size of Ai-p (see 
[19]), and the complexity for finding the optimal policy from 
a linear program is polynomial in the size of A4-p. Thus, 
overall, our algorithm is polynomial in the size of M. v . 

VI. Example 

The computational framework developed in this paper is 
implemented in MATLAB, and here we provide an example 
as a case study. Consider a robot navigating in an indoor 
environment as shown in Fig. [5] Each region of the environ- 
ment is represented by a vertex Vi, and the arrows represent 
allowable transitions between regions. In this case study, 
we choose the motion primitives arbitrarily (see the caption 
of Fig. BJ. In practice, they can either correspond to low 
level control actions such as "turn left", "turn right" and "go 



P (vi3, pickup) = 1 
P (vi3, observe9) = 0.4 



P (v7, event7) = 1 



P (t;g,eveiit9) = 0.8 



Fig. 5. Environment for a numerical example of the proposed approach. 
We assume that the set of motion primitive is U = {a, 0,f}. The number 
of actions available at each vertex depends on the number of arrows from 
that vertex to adjacent vertices. We define the enabling function A so that 
the motion primitive a is enabled at all vertices, /3 is enabled at vertices 
Vt, i>6 and 1)7, and 7 is enabled at vertices i>2, i>3, vq, vj and vg. 



straight", or high level commands such as "go from region 
1 to region 2", which can then be achieved by a sequence 
of low level control actions. 

The goal of the robot is to perform a persistent surveillance 
mission on regions vj and vg, described as follows: The 
robot can pickup (or receive) a surveillance task at region 
V13. With probability 0.4 the robot receives the task denoted 
observe 9. Otherwise, the task is observe 7. The task 
observe 7 (or observe 9) is completed by traveling to 
region v 7 (or Vg), and observing some specified event. In 
region vj, the robot observes the event (event 7) with 
probability 1. In region tig, each time the robot enters the 
region, there is a probability of 0.8 that it observes the event 
(event 9). Thus, the robot may have to visit vg multiple 
times before observing event 9. Once the robot observes 
the required event, it must return to ^13 and pickup a new 
task. 

This surveillance mission can be rep- 
resented by four atomic propositions 

{pickup, observe9, event7, event9}. (the task 
observe7 can be written as ^observe9). The 
propositions pickup and observe7 are assigned to V13, 
with P (vt3, pickup) = 1 and P Q (vi 3 , observe 9) =0.4. 
The proposition event 7 is assigned to vj with 
P (vj, event7) = 1 and event9 is assigned to Vg 
with P (t>9,event9) = 0.8. 

The surveillance mission can be written as the following 
LTL formula: 

<f> = DOpickupA 

□ (pickup A ^observe9 =>■ Ot^Pickup Wevent7)) 
A □ (pickup A observe9 =>■ Ol^pickup Wevent9)) . 

The first line of <fi, DOpickup, enforces that the robot 
must repeatedly pick up tasks. The second line pertains to 



task observe 7 and third line pertains to task observe 9. 
These two lines ensure that a new task cannot be picked up 
until the current task is completed (i.e., the desired event is 
observed). Note that if event 9 is observed after observing 
event 7, then the formula 4> is not violated (and similarly 
if event 7 is observed after observing event 9). 

The MDP M. generated from the environment is shown in 
Fig. [6] For this example, we have arbitrarily chosen values 
for the probability transition function P m . In practice, proba- 
bilities of transition under actuation and measurement errors 
can be obtained from experiments or accurate simulations 
(see [15]). The number of states in the MDP M is \S\ = 15. 



pickup 




Fig. 6. MDP M generated from the environment with given U, A, P 
and P m . The initial state so is marked by an incoming arrow (t(«o) = !)■ 

We generated the deterministic Rabin automaton TZ^ using 
the ltl2dstar tool (see [23]). The number of states \Q\ is 52. 
Thus, the product MDP Mv has 780 states. For the DRA 
generated, there is only one set in F, i.e., F = {(L,K)}, 
with 1 state in L and 18 states in K. Thus, the number of 
states in L v is 15 and the number of states in K v is 270. 
There is one accepting maximum end component in Ai-p, 
and it contains 17 states. 

Using the implementation of Alg. [T] we computed the 
maximum probability of satisfying the specification from the 
initial state and the optimal control strategy. The Algorithm 
ran in approximately 7 seconds on a MacBook Pro computer 
with a 2.5 GHz dual core processor. For this example the 
maximum probability is 1, implying that the corresponding 
optimal control strategy almost surely satisfies <fi. To illustrate 
the control strategy, a sample execution is shown in Fig. [7] 

VII. Conclusions and Final Remarks 

We presented a method to generate a robot control strategy 
that maximizes the probability to accomplish a task. The 
robot motion in the environment was modeled as a graph 
and the task was given as a Linear Temporal Logic (LTL) 
formula over a set of properties that can be satisfied at 
the vertices with some probability. We allowed for noisy 
sensors and actuators by assuming that a control action 
enables several transitions with known probabilities. We 




I I I I I 

Fig. 7. A sample path of the robot with the optimal 

control strategy. The word observed by the sample path is 

pickup, event7, event9, {pickup, observe9}, event9, . . .. 

reduced this problem to one of generating a control policy 
for a Markov Decision Process such that the probability of 
satisfying an LTL formula over its states is maximized. We 
then provided a complete solution to this problem adapting 
existing probabilistic model checking tools. 

We are currently pursuing several future directions. We 
are looking at proposition observation models that are not 
independently distributed. These models arise when the 
current truth value of the proposition gives information about 
the future truth value. We are also looking at methods for 
optimizing the robot control strategy for a suitable cost 
function when costs are assigned to actions of an MDP. The 
second direction will build on our recent results on optimal 
motion planning with LTL constraints [27]. 
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